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Abstract 

Using the notion of consistency of empirical measures, under fairly general assumption we 
prove a joint large deviation principles in n for the empirical pair measure and empirical 
offspring measure of multitype Galton- Watson tree conditioned to have exactly n vertices 
in the weak topology. From these results we obtain a large deviation principle for empirical 
pair measure of Markov chains indexed by simply generated trees obtain by conditioning 
Galton- Watson trees on the total number of vertices. For the case where the offspring law of 
the tree is geometric distribution with parameter ^, we get an exact rate function. All our 
rate functions are expressed in terms of relative entropies. 
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1. Introduction 

Recently, conditioned Galton- Watson trees (CGWT) have received an increasing attention from re- 
searchers (see, e.g. [A191aj . |A19 lbj . [A193] . [TjGu2] . [DU03 j ■ [KI1 la| . |KI1 lbj . |RD 1 1 j ) because of its abil- 
ity to model fairly well phenomena which occur in natural hierarchy, for example, Mutations in 
mitochodrial DNA |OS02| . 

In the article [Klllaj a new type of conditioning involving the number of leaves of the Galton- Watson 
tree have been introduced in order to analyse a specific discrete probabilistic model, namely dissections 
of a regular polygon with Boltzmann weig hts. [RDllj dealt with the more general conditioning on 
having a fixed number of vertices with degree in a given set. Further, the article [Klllb] investigated 
the asymptotic behaviour of critical Galton- Watson trees whose offspring distribution may have infinite 
variance, conditioned on having a large fixed number of vertices with degree in a given set. 

Large deviation analysis of Galton- Watson trees conditioned on the total size was first studied by 
Dembo, Morters and Sheffied [DMS 03] . In this article, notions of shift-invariance and specific relative 
entropy-as typically understood for Markov fields on deterministic graphs such as Z rf was extended to 
Markov fields on random trees. With these concepts, large deviation principles for empirical measures 
of a class of random trees including multitype Galton- Watson trees conditioned to have exactly n 
vertices were proved in a topology stronger than the weak topology. Their analysis have shown that 
large deviation results, which are well-known for classical Markov chains, can be extended to Markov 
chains indexed by random trees with offspring laws which have superexponential decay at infinity.i.e. 
Offspring law p(-) with all its exponential moments finite. 
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In the course of the proof of their main result, large deviation principle for the empirical offspring 
measure of multitype Galton- Watson trees, see |AN72| . whose exponential moments are all finite was 
established in their topology, see jDMS03] . 

However, their result failed to address offspring laws which have sub-exponential and exponential 
decaying function at infinity. Example, a Markov chain indexed by geometric | offspring law which 
may be described as follows : First we sample a tree from a set of Galton- Watson trees with geometric 
distribution with parameter ^, and then, given this tree, we run a Markov chain on the vertices of the 
tree in such a way that the state of a vertex depends only on the state of its parent. The result of this 
two-step experiment can also be interpreted as a typed tree. 

The aim of this article is to carry out a non-tivial extension of the large deviation principle for tree 
indexed markov chain of [DMS03, Theorem 2.2] to cover offspring laws not discussed by [DMS03J. 

To be specific, we prove a joint large deviation principle for the empirical pair measure and empirical 
offspring measure of multitype Galton- Watson trees with offspring laws which have finite second 
moments. This includes offspring laws considered in [DMS03J. 

To deal with the problem of exponential tightness in the strong topology encounted in |DMS03j which 
necessitated the use of strong moment condition, we extend the concept of consistency as understood 
for empirical measures of coloured random graphs, see Doku and Morters |DM10j or Doku [DA06j. 
to multitype Galton- Watson trees. With this concept, we prove the upper bound, under finite second 
moment assumption, using the technique of (exponential) change of measure and two large deviations 
results from [DMS03, Lemma 3.6] and [DMS03, Theorem 2.2], in the weak topology. 

Our proof of the lower bound unlike the upper bound uses the technique of approximating a given 
Multitype Galton- Watson tree from below by another Multitype Galton- Watson tree which we shall 
obtain by restricting the offspring distribution to some bounded set Xi. Taking appropriate limit as 
k goes to infinity we obtain asymptotic results about the full tree. 

Using the contraction principle, see Dembo [DZ98| , we derive from our main results large devi- 
ation principle for empirical pair measure of Markov chain indexed by random trees, see Ben- 
jamini and Pcres[BP94j. This result is similar to the one in [DMS03J. We remark here that the 
process level large deviation principles for the empirical subtree measure and single-generation empir- 
ical measure, see [DMS03 , can be developed from our main results. 

Specifically, we consider random tree models where trees and types are chosen simultaneously according 
to a multitype Galton-Watson tree. We recall from [DMS03 the model of multitype Galton- Watson 
tree. For a finite alphabet X, we write X* = U^oi n i x ^™ an< ^ ec t m P it with the discrete topology. 
We denote by T the set of all finite rooted planar trees T, by V = V(T) the set of all vertices and 
by E = E(T) the set of all edges oriented away from the root, which is always denoted by p. We 
write \T\ for the number of vertices in the tree T. We note that the offspring of any vertex v € T is 
characterized by an element of X* and that there is an element (0, 0) in X* symbolizing absence of 
offspring. For each typed tree X and each vertex v we denote by 

C(v) = (N(v),X l (v),..., X N(V) (v)) e X* 

the number and types of the children of v, ordered from left to right. 

Given a probability measure p on X, serving as the initial distribution, and an offspring transition 
kernel Q from X to X* , we define the law P of a tree-indexed process X, see Pemantle |Pe95j , by the 
following rules: 

• The root p carries a random type X(p) chosen according to the probability measure p on X. 
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• For each vertex with type a G X the offspring number and types are given independently of 
everything else, by the offspring law Q{ ■ | a} on X* . We write 

Q{- \a} = Q{(N,X 1 ,...,X N ) G ■ | a}, 

i.e. we have a random number TV" of offspring particles with types X\, . . . , X/v- 

For every c = (n, a%, . . . , a n ) G A 7 * and a € X, the multiplicity of the symbol a in c is given by 

n 

m(a, c) = l{aj=a}> an d the matrix ^4 with index set X x X and nonnegative entries is given by 
i=i 

A(a, b) = Q{ c I b}m(a, c), for a,b & X. i.e. A(o, 6) are the expected number of offspring of type a 

cex* 

of a vertex of type b. We also recall from [DMS03] the weak form of irreducibility concept. With 
A* (a, b) = YlkLi A k (a, b) G [0, oo] we say that the matrix A is weakly irreducible if X can be partitioned 
into a non empty set X r of recurrent states and a disjoint set Xt of transient states such that 

• A* (a, b) > whenever b G <-f r , while 

• A* (a, b) = whenever b £ X t and either a = 6 or a £ Af r . 

For example, any irreducible matrix ^4 has ^4* strictly positive, hence is also weakly irreducible with 
X r = X. The multitype Galton- Watson tree is called weakly irreducible (or irreducible) if the matrix A 
is weakly irreducible (or irreducible, respectively) and the number Y2aeX m ( a i c ) °f transient offspring 
is uniformly bounded under Q. 

Recall that, by the Perron-Frobenius theorem, see e.g. |DZ 98. Theorem 3.1.1], the largest eigenvalue of 
an irreducible matrix is real and positive. Obviously, the same applies to weakly irreducible matrices. 
The multitype Galton- Watson tree is called critical if this eigenvalue is 1 for the matrix A. 

The remaining part of the article is organized in the following manner: The complete statement of 
our results is given in Section [21 we begin with joint LDP for empirical pair measures and empirical 
offspring measures of multitype Galton- Watson trees, followed by a corollary of LDP for the empirical 
offspring measure of multitype Galton- Watson trees in subsection 12.11 In subsection 12.21 we state the 
LDP for empirical pair measures of Markov chains indexed by a tree. The proofs of our main results 
are then given in Section All corollaries and Theorem 12.41 are proved in Section HJ 

2. Statement of the results 

2.1 Joint large deviation principle for empirical pair measure and empirical offspring 
measure of multitype Galton- Watson trees. 

For every sample chain X, we associate the empirical offspring measure Mx on X x X* , by 

M x (a, c) = — ^2 8(x(v),C{v)) 0= c )> (2- 1 ) 

and the empirical pair measure on X x X, by 

Lx(a,6) = — -^<5 {x(ei)>x(e2)) (a,6), for a, b G X, (2.2) 

where ex, ei are the beginning and end vertex of the edge e G E (so e\ is closer to p than e-i). We note 
that 

Lx(a,b) = rn(b, c)Mx(a, c). 
cex* 
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By definition, we notice that Mx is a probability vector and that total mass ||-Ljjc|| of Lx is ^ < 1. 

Our main result is a large deviation principle for (Lx, Mx) if X is a multitype Galton- Watson tree. 

We denote by M(X x X*) the space of probability measures u on X x X* with J nv(da ,dc) < oo, 
using the convention c = (n, aj., . . . , a n ). Denote by Ai(X x Af) the space of finite measures on X x X 
and endow the space A4(X x X) x M.(X x A"*) with the weak topology. 

We call (w, v) <E M(X x X) x M(X x X*) sub-consistent if 

zu(a,b) > Tn(b,c)v(a,c) for all a,b £ X. (2-3) 

It is call consistent if equality hold in (|2.3p . Observe that, if (tu, is empirical pair measure and 
empirical offspring measure of multitype Galton- Watson tree then both sides of (|2.3p is 

^ x (jjnumber of edges with beginning vertex of type a and end vertex of type &}. 

We call an offspring distribution Q bounded if for some k < oo, we have 

> k | a} = 0, for all a € X . 



Otherwise we call it unbounded. To formulation our first LDP, we denote by v\ the ^-marginal of 
probability measure v on X x X* and by W2 the second marginal of finite measure w on X x X . 

Theorem 2.1. Suppose that X is a weakly irreducible, critical multitype Galton-Watson tree with 
offspring law Q whose second moment is finite, conditioned to have exactly n vertices. Then, for 
n — > oo, (Lx, Mx) satisfies a large deviation principle in hA(X x X) x A4(X x X*) with speed n and 
the convex, good rate function 

T( . _ J H(v || v\ ® Q) if (w, v) is sub-consistent and W2 = v\, l0 A \ 

y oo otherwise, 

where VJ2 is the second marginal of the finite measure w and v\ is the X— marginal of the probability 
measure v. 

For v E M.(X X X), we write 

(m(-,c), v(a,c))(b) := m(b, c)u(a, c), for b € X, 

(a,c)&XxX* 

and state a corollary of Theorem 12.11 

Corollary 2.2. Suppose that X is a weakly irreducible, critical multitype Galton-Watson tree with 
an offspring law Q whose second moment is finite, conditioned to have exactly n vertices. Then, for 
n — > oo, the empirical offspring measure Mx satisfies a large deviation principle in A4(X x X*) with 
speed n and the convex, good rate function 

K r v \ _ / H ( v II u i ® Q) if W-,c), v(a,c)) < v x , ^ g . 

1 oo otherwise. 

Here, we remark that finite second moment assumption in Theorem 12 . 1 1 and Corollary 12.21 is necessary 
for us to establish the subexponential decay of the probability of the event {|T| = n} on the set 
{n G N : F{\T\ =n} > 0}. See (DMSOSl Lemma 3.1]. 

We write X£ = Un=o{ n ) x ^ n an< ^ denote by Qfc offspring transition kernel from X to X£. The next 
large deviation principle is the main ingredient in the proof of the lower bound of Theorem 12.11 
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Theorem 2.3. Suppose that X is a weakly irreducible, critical multitype Galton-Watson tree with an 
offspring law Qk, conditioned to have exactly n vertices. Then, for n — > oo, (Lj, M x ) satisfies a large 
deviation principle in Ai(X x X) x A4(X x X£) with speed n and the convex, good rate function 

, , ( H{y || V\ % Qfc) if (w, v) is consistent and W2 = V\, /n 

JhiZU, u) = < ,, (2.0) 

oo otherwise. 



2.2 LDP for empirical pair measure of Markov chains indexed by trees 

In this subsection, we look at the situation where the tree is generated independently of the types. 

Suppose that T is any finite tree and we are given an initial probability measure p on a finite alphabet 
X and a Markovian transition kernel Q : X x X > 0. We can obtain a Markov chain indexed by tree 
T, X : V — > X as follows: Choose X(p) according to p and choose X{y), for each vertex v ^ p, using 
the transition kernel given the value of its parent, independently of everything else. If the tree is 
chosen randomly, we always consider X = {X(v) : v £ T} under the joint law of tree and chain. It 
is sometimes convenient to interpret X as a typed tree, considering X(v) as the type of the vertex v. 



We consider the class of simply generated trees, see [MM78] or [A191a] . obtained by conditioning a 
critical Galton-Watson on its total number of vertices. To be specific, we look at the class of Galton- 
Watson trees, where the number of children N(v) of each v £ T is chosen independently according 
to the same law p(-) = ¥{N(v) = •} for all v £ T, while < p(0) < 1. We assume that p is 
critical. That is, the mean offspring number Y^{>=o * s one > but this assumption is not restrictive: 
Note that the distribution of T conditioned on {|T| = n} is exactly the same as when the offspring 
law is p e (t) = p(£)e 0e /^2 j p(j)e 0j , regardless of the value of 9 G R. With < p(0) < 1 - p(l) there 
exists a unique 6* such that Yle^Pd*(^) = 1- Hence all our results hold in the noncritical cases with 
pe, in place of p. We allow offspring laws p with unbounded support, but we relax the assumption the 
all exponential moments of p are finite, i.e. we relax the assumption l~ l \ogp(i) — > — oo. 

We shall assume hereafter that the statement conditioned on the event {|T| = n} are made only for 
those values of n where the event {\T\ = n} has positive probability. 

For each typed tree X, we recall from [DMS03., the definition of empirical pair (probability) mea- 
sure Lx on X x X as 

L x {a,b) = — ^<5(x( ei ),x(e 2 ))(ci, b), for a, b eX, (2.7) 

where e\, C2 are the beginning and end vertex of the edge e £ E (so e\ is closer to p than e^). Notice, 
Lx = ^rf-^x on th e set {\T\ = n} and hence the LDP for Lx implies Lx by exponential equivalent 
Theorem, see Dembo [DZ98], Our first result in this subsection is a large deviation principle for Lx- 

Theorem 2.4. Suppose that T is a Galton-Watson tree, with offspring law p{-) such that < p(0) < 
1 — p(l), ^2i^p(^) = 1 and Yle^ 2 P(^) < 00 • Let X be a Markov chain indexed by T with arbitrary 
initial distribution and an irreducible Markovian transition kernel Q. Then, for n —> oo, the empirical 
pair measure Lx, conditioned on {\T\ = n} satisfies a large deviation principle in A4(X x X) with 
speed n and the convex, good rate function 



^2 (a)' 

oo otherwise, 



m={ H{p\\p^Q) + ^ia)I p {-^) if pi < P2, ^ 
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where 

oo 

I p (x) = sup \\x - log [ Vp(n)e An | ) , (2.9) 



ra=0 

and //i, [12 are the first and second marginal of \x respectively, and m ® Q(a, &) = Q{b \ a}fj,i(a). 

From Theorem 12.41 we obtain LDP for empirical pair measures of Galton- Watson trees with geometric 
distribution with parameter |. 

Corollary 2.5. Suppose that T is a Galton-Watson tree, with offspring law p(£) = 2~^ +l \ i = 
0, 1, . . . ,. Let X be a Markov chain indexed by T with arbitrary initial distribution and an irreducible 
Markovian transition kernel Q. Then, for n — > oo, the empirical pair measure Lx, conditioned on 
{\T\ = n} satisfies a large deviation principle in M.{X x X) with speed n and the convex, good rate 
function 

Ka) = ! H{fi\\n 1 ^Q) + H(fxi\\(fi 1 + fx 2 )/2)+H(n 2 \\^ 1 + f x 2 )/2) if nx < Hi, (21Q) 
1 oo otherwise, ' 

where fii and fi 2 are the first and second marginal of and [i\ ® Q(a, b) = Q{b \ a}/j>i(a). 

3. Proof of Main Results 

3.1 Change of Measure, Exponential Tightness and Some General Principles. 

Denote by C the space of bounded functions on X x X * and for g £ C, we define the function 

U- g (a) = log e^ a > c) ®{c\a}, (3.1) 
cex* 

for a €z X. Using g we define the following new multitype Galton-Watson tree : 

• Assign the root p, type a € X according to the probability distribution ^ g (a) given by 

« w = (3 - 2) 

• For every vertex with type a £ X the offspring number and types are given independently of 
everything else, by the offspring law Q{ • | a} given by 

Q{c\a} =Q{c\a}exp(g(a,c) -U g {a)). (3.3) 

By P we denote the transformed law and observe that P is absolutely continuous with respect to P. 
Specifically, for each finite X, 

dp (x) = feWum n exp [° {x ^ c{v)) - u ^ x{v)) \ (3 - 4) 

= fe^u(da) II eX P [»)- C ^ ~ E ™( 6 > C(v))U 9 (b)] (3.5) 

J I v& y 

±— — exp [(g - m(b, -)U g (b), M x )\ , (3.6) 



/ e u s^n(da) 



where C(v) = (N(v), X\{v ), . . . ,Xn(v)}. We recall from |DMS03j the following exponential tightness 
theorem of the family of laws of Mx on the space M{X x X*) equipped with their topology. 
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Lemma 3.1. For every a > there exists a compact K a C A4(X x X*) with 

limsup-logP{M x G" K a \ \T\ = n) < -a. 

Let I G N, and B(l) G N large enough such that Q{e z2l < JV>s < i )> \a} < 2 l , for all a. Using the exponential 
chebyshev's inequality we obtain, 

f{m x [N > B(l)] > r 1 , \T\ = n} < e" n 'E|e z2E -6^ 1 { Jv w> s ( i )} , |T| = nj 

= e" /n E{ exp (l 2 l {N(v)> B(i)}), \T\ = n} 

< e- nl ( S up®{eMl 2 hN>B(i)})\a}Y < e^ 1 " 10 ^ . 



Fix a and choose M > a + log2. Define the set T M = : i^[iV > B(l)\ < > Afj. Note, 

{iV < C X X <V* is compact, and so we have that the set Tm is pre-compact in the weak 

topology, by Prohorov's criterion. Moreover, 

F{M x ?T M \\V(T)\=n} < -^i—^—L^ e xp(-n(M - log 2)), 

hence using [DMS03, Lemma 3.1] we get 

limsup-logP{Mx G" K a \ \ V(T)\ = n) < -a, 

for the closure K a of Tm- This ends the proof of the tightness Lemma. 

We remark here that this result also holds for M(X x X*) endowed with the weak topology. 

We denote by A4 S the set of all sub-consistent measures, and by A4 C the set of all consistent measures 
in M.(X x X) x M.(X x X*) and notice that M. c Q M s - For k a natural number,we denote by 
M. c ,k the set of consistent measures in M.{X X X) X M.(X x X£). Then, M s is a closed subset of 
M(X x X) x M(X x X*) and M c ^ is a closed subset of M(X x X) x M(X x X£). The next two 
large deviation principles will help us extend LDP in M Ct k, M. s to M(X x X) x M(X x X£) and 
M{X x X) x M(X x X*) respectively. 

Lemma 3.2. Suppose X is a multitype Galton-Watson tree with offspring law Q. Assume (Lx,Mx) 
conditioned on the event {\T\ = n} satisfies the LDP in A4 S with convex, good rate function 

'I oo otherwise. 

Then, (Lx,Mx) conditioned on the event {\T\ = n} satisfies the LDP in M.(X x X) x M.(X x X*) 
with convex, good rate function J. 

Proof Observe that, {\T\ = n} := {uj : \T\(u) = n) C {u : (L x ,M x )(oj) G M c ) =: {(L X ,M X ) G 
M.c] and so, for all re, we have P{(Lx, Mx) G Ms \ \T\ = re} = 1. Also, if (w n ,u n ) G M s converges 
to (vj, u) then by the Fatou's Lemma, we have that 

w(a,b) = lim m n (a,b) > lim N m(b, c)u n (a, c) >liminf N m{b,c)v n {a,c) > > m(b, c)v(a, c), 
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which implies (zu, v) is sub-consistent. This means M. a is a closed subset of M.(X x X) x A4(X x A"*). 
Therefore, by [DZ98, Lemma 4.1.5], the LDP for (Lx, Mx) conditioned on the event {|T| = n} holds 
with convex, good rate function J. □ 

Lemma 3.3. Suppose X is a multitype Galton-Watson tree with offspring law Assume (Lx,Mx) 
conditioned on the event {\T\ = n] satisfies the LDP in A4 c ,k with convex, good rate function 

J k (zu,u) = \ ff Hki®Q fc ) i{^ = ^ (3.8) 
v ' [ oo otherwise. 

Then, (L~x,Mx) conditioned on the event {\T\ = n} satisfies the LDP in M(X x X) x M(X x X£) 
with convex, good rate function 

Proof. Using the same argument as in the proof of Lemma 13.21 we have P{(Zx, Mx) G M-c,k | 1^1 = 
nj = 1. Moveover, as m(a,c) < k for all (a, c) G X x if (zu n ,u n ) G -M Cj fc converges (zu,v) then we 
have 

zu(a,b) = lim zu n (a,b) = lim N m{b,c)v n {a,c) = m(b,c)u(a,c), 

ceA-* ce# fc * 
which implies (ro, i/) is consistent. This means 7W Cj fc is a closed subset of M.(X x X) x M.(X x A^). 
Hence, by [DZ981 Lemma 4.1.5], the LDP for (Lx, Afx) conditioned on the event {|T| = n} holds 
with convex, good rate function which completes the proof of the Lemma. □ 

In view of Lemmas 13.31 and 13.21 we establish large deviation principles in the spaces M c ,k an d A4. s . 

3.2 Proof of the upper bound in Theorem 12.11 Next we derive an upper bound in a variational 
formulation. Denote by C the space of bounded functions on X x X * and define for each (zu, v) 
sub-consistent element in M(X x X) x M(X x X*), the function J by 



J(w, v) = sup | J g(b,c)v(db ,dc) — J U g (b)w(da, db) j 

= sup | J g(b,c)v(db ,dc) — j m(b, c)U g (b)v(da , dc) j 



bex 



where c = (n, a%, . . . , a n ). 

Lemma 3.4. For each closed set F C AA S , we have 



limsupilogP{(Z x , M x ) G F\ \T\ = n) < - inf J(zu, v). 



Let g G C be bounded by M. Note from the definition of U g from (|3.ip that < M. Using (|3.4p , we 
obtain 



e M > 



J e u ^n(da) = E{ exp [(g - m(b, -)U g (b), M x )\ , \V(T)\ = n} 



fee x 

Now, we take limit as n approaches infinity and use [DMS031 Lemma 3.1] to obtain 



lim -logE\ exp [(g -S2m(b, -)U g (b), M x )] \V(T)\=n}<0 (3.9) 
Similarly, we can use (|3.4p and [DMS03J Lemma 3.1] to obtain 
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lim -\ogE\exp [(g-Ug, M x )] \V(T)\=n}<0. (3.10) 
Next, we write J £ (zv, v) := min{ J(zu, u), e -1 } — e. Fix (w, u) E F and choice g £ C, such that 

[<5, ^) - (U g , vj)] > J £ (zv,u) 

Now, since g and C/g are both bounded function we can find open neighbourhood B v of zu and f 
respectively, such that we have 



inf Ug, u) - (Ug, zu)] > Ug, v) - (Ug, zu)] > J £ (zu, v) - e 



(3.11) 



Applying the exponential Chebyshev inequality to (|3.1ip and using (|3.9p we obtain that 
1 

< 



limsup-logP{(L x , M x ) G B m x B v \ \V(T)\=n) 

:iimsupilogE{exp[(5- Vm(6, -)U g (b), M x )} \V(T)\ = n\-J £ 



tZ7 , Z^l + £ 



<— inf J e (ro,^) + e. 

Let assume that (ro, v) fails to be sub-consistent i.e. There exists a G X such that 

^i(a) < W2(a). 



(3.12) 



(3.13) 



Then we can find 5 > and a small open neighbourhood B m x B u C M(X x X) x M(X x X*) such 
that 

vi(a) < vj2(a) — 5, for all (zu, v) G -B ro x i?^. (3.14) 

We write g(b,c) = — (fe) _1 l a (6) and note that, by the definition (|3.ip . we have £7<j(6) = g(b,c) for all 
6 and this vanishes unless b = a. Hence, by (|3. 14[) . for every (zu, v) G B w x 5 V we have that 



j (g-Y.m^^Ug^duye- 1 . 



We use the exponential Chebyshev inequality and (|3.11l) to obtain 
1 

n— >oo Tl 



limsup-logP{(L x , M x ) G B m x B v \\V(T)\ = n\ 

<limsupilogE{exp [(5 - V m(6, 0^(6), |V(T)| = n) - jT 1 < -fT 1 



<— inf JAzu,v) + e. 

Now we use Lemma [3TT1 to choose a compact set iC Q (for a = e _1 ) with 

limsupilogP{M x <£K a \\V(T)\ = n) < -e~ l . 

For this K a we denote by 

T a := {(zu,u) : (zu,u) G M(X x X) x M(X x X*), v G 



(3.15) 



(3.16) 



10 



KWABENA DOKU-AMPONSAH 



The set T a PiF is compact and hence it may be covered by finitely many of the sets B mi x B Ul , . . . , B VJm x 
B Vm , with (wi, Vi) G F for i = 1, . . . , m. Hence, 

m 

¥{M X G F | \T\ = n) < ^P{Mx G | \T\ = n) + P{M X K Q | |T| = n). 

i=i 

Using (|3.12p and (|3.15p we obtain, for small enough e > 0, that 

limsupilogP{(L x , M x ) G F \ \T\ = n) < max limsup - logP{(L x , M x ) € B VJi x B Vi \ \V(T)\ = n) 

n— >oo Tl i— 1 n— >oo W 

<— inf J e (-ri7, i/) + e. 

Taking e i gives the required statement 

lemma is completely analogous to the proof of [DMS03, Lemma 3.3 ,p. 983]. 
Recall that J : M. s — > [0, oo] is given by 

J, „) = { H (v\\vi®Q) iiw 2 = vi, (3 17) 

^ 1 ' | oo otherwise. ' 



We show that the convex rate function J may replace the function J of (|3.2p in the upper bound of 
Lemma 13.41 

Lemma 3.5. The function J is convex and lower semicontinuous on Ai s . Moreover, J(w, v) < 
J(zu, u), for any (zu, v) G M. s . 

The proof of the inequality J(w, u) < J(zu, v) is analogous to the proof of [DMS03, Lemma 3.4]. To 
prove that J is convex, good rate function, we consider the convex, good rate function (J) : R — >• [0, oo] 
given by <fi(x) = xlogx — x + 1. Then, we can represent the left side of (|3.17p in the form 

HH\ * ® Q) = ( J * ° fd{Ul Q) [if:= ^ eXiStS (3.18) 
[ oo otherwise. 

Consequently, by [DZ98j Lemma 6.2.16], J is a convex, good rate function. 

By Lemma 13.21 the large deviation upper bound Lemma 13.41 holds with rate function J replaced by J. 

3.3 Proof of Theorem [2731 We begin by recalling from |DMS03j that v G M{X x X*) is 
shift-invariant if 

V\ (a) = m(a,c)u(b,c), for all a G X. 

(b,c)eXxX* 

This Theorem is derived from [DMS03, Theorem 2.2] by applying the contraction principle to the 
linear mapping G : M(X x X%) i-> M(X x X) x M(X x X£) given by G(v) = (w,v) where (w,u) 
is consistent. Specifically, [DMS031 Theorem 2.2] implies the large deviation for G(Mx) = (L,Mx) 
with convex, good rate function Jk{w,v) = inf {K}.(y) : G(y) = (vj,u), {td,v) is consistent}, where 

{H(v || vi <g) Q fe ) if (m(-,c), u(a,c)) = ui, 
(3.19) 
oo otherwise. 
Using shift-invariance and consistency we have 

Vl (°) = m(o, c)u(b, c) = q) = m2(a), for all a G X. 

(b,c)eXxX* beX 
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Therefore, by Lemma 13.31 the LDP for (Z, Mx) conditional on the event {|T| = n} holds in A4(X x 
X) x Ai(X x Xu) with convex, good rate function J k . 

3.4 Proof of the Lower Bound in Theorem 12.11 

We define for every weakly irreducible, critical offspring kernel Q{c | 6} the conditional offspring law 

Q t {c|a} = ( «T*W Q{C| "' (3.20) 
I U otherwise, 

where Q{X£ \a}= ^ Q{c | a}. We write || ^ ||fe = v{X x Xj*) and observe that 
lim i/(AT x XI) = lim V l{(a,c)eXxX£} u ( a > c ) = 1, 

fc— >00 fc— >00 ' ' K 

(a,c)eA'xA'* 

by the dominated convergence. We define v k a probability measure on X x <Y* by 

i*(a,c) = { HU ^ ; . * (3.21) 

I (J otherwise, 

and denote by (ffe)i the X— marginal of the probability measure v k . We write 

w k {a,b) := m(b,c)v k {a,c). 

Note that for any bounded function / : X X Af* — >• R, we have lim^oo J /(i^fe — > J /Aa We write for 
(a,c)eXxX* 

Ma,cj :- i|„ fe(ac)>0 | 10 § (i/ fc )i®Q fc (a,c)' 

f (0,g]/ c \—\, , l og ^C"' ) 

J 1«>W ■ 1 { 1 , 1 ®Q( aiC )> l ,(a, c )>o} S ^®Q(o,c)' 

f fl ^ — 1 , , l og u k(a,c) 

J . • L | 1 > I/ ( 0)C ) >I/1(g| Q (0)C) | i/i®Q(a,c) 

and observe that /^'^ positive bounded function but /( ' 9 1 is non-positive, unbounded continuous 
function. 

Lemma 3.6 (Limit Inequalities ). Let v <C v\ <8> Q. Then, the following limits holds: 

(i) hmsup(/(°^, ^} < 

fc— >oo 

(ii) limsup(/fe, i/fc) < (/(H !/) + (/to' 1 !, !/). 

fc— >oo 

Proof, (i) Recall that we have assumed !/ < ^i^Q and note that by definition /( > ? ] < 0. Using [DZ98, 
Theorem D.12, p. 357], we obtain 

liminf(-/(°^, v k ) > <-/(«.«], „). 

fc— >oo 

Factorizing —1 out of the integrals and divide both side of the inequality by —1, we obtain the limit 
inequality (i) of Lemma (|3,6p . 

(ii) Notice, lim f k = + f^ q,l \ Hence, for any 5 > and sufficiently large k we have that 

fc— >oo 

(A, »k) < (f M , v k ) + (f (q > l \v k ) + 5. (3.22) 



12 



KWABENA DOKU-AMPONSAH 



Next, we take limit as k approaches oo of all sides of (|3.22|) and Lemma (|3.6p (i) to obtain 

lim sup</ fc , v k ) < (f m , v) + (/ (<? ' 1] ,u) + S. (3.23) 



Taking 5 I we have the desire result which concludes the proof of the Lemma. 



We define the total variation metric d by 



□ 



d (v^) = \ 2 Ha,c)-u(a,c)\. (3.24) 

This metric generates the weak topology. By M{X) we denote the space of counting measures on X. 
We recall that M. s denotes set of sub-consistent measures in M.(X x X) x JVi(X x X*), M c ,k denotes 
set of consistent measures in A4(X x X) x A4(X x XV). In the next five Lemmas, we approximate 
J(zu,v) for a sub-consistent pair (w, u) by Jki^ki^k) with (wk,^k) consistent. 

Lemma 3.7 (Approximation Step 1 ). Suppose (w,v) is sub- consistent. Then, there exists a 
sequence (w n ,i) n ) G M. s such that 

(i) (ro n ,z) n ) is consistent. 

(ii) (uj n , u n ) — > (uj, i/) as n — > oo. 

Proo/. For any b £ X we define G jV(Af) by e^(a) = if a / 6, and e^(a) = 1 if a = 6. We write 
m(c) = (m(a, c), a G X) and for large n define v n G M. s by 



i) n (a, c) = u{a, c) (l - IHHIMv=W-,c))ll ) + l{m(c) = ne b } ro(a ' fc) " <m( ^ c) '" ( -- c))(a ' fe) . (3.25) 
We note that i> n — >■ ^ and that, for all a, b G 



6e* 



^ mKc)i) n (5 !C ) = (l-HH^M^lj £ m(a,cM6,c)+tJ7(a,6)-(m(.,c), v{;c)){a,b) 

c(=X* ' c&X* 

= w{aj b) _ lkl|-|IM,c),,C,c))|| (m( . ; c)> ^ c))(a> ft) nt- 6)> 

Defining ro n by -6j n (a,b) = X^ceA"* m (°> c)u n (b, c), we have a sequence of consistent pairs (w n ,0 n ) 
converging to (tu, i/). □ 

From (txj n , z> n ) we construct a sequence of (zuk,m v k,n) hi -M Cj fc closed in limit to (w, v). To this end, 
for k G N, we define Vk,n a probability measure on X x A"* by 

/ \ J if (a, c) £ X x X£ f . 

^fc,n(a,c) = < IKIU ; fc (3.26) 

I U otherwise, 

and denote by (Pk,n)i the X— marginal of the probability measure Vk,n- We take 

Wk, n {a,b) := ^2 m (t ) ,c)i'k,n{ a i c )- 
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Lemma 3.8 (Approximation Step 2 ). Suppose (vj,u) is sub- consistent. Thenjor every e > 0, 
there exists k(e) and n(e) such that for large k > k(e), n > n(e), we have 

\zu(a, b) — zuk, n (a, b)\ < e, for all a,b G X , 
Proof. Let e > and take 8 = |. We choose k{e) large enough such that 

I 1 {(a,c)eXxX*} I - 

SUP rp-n * 1 < 0. 

(a,c)eXxX* 11 llfc 

For k > k(s), we recall the definition v k n and w k n , and observe that 

d(h,n,t>n) = \ |^,n(o.c) -£„(a,c)| = - ^ Z> n (a, c) | ^jgj^ - 1 1 < ^(5 

= e/3 

Define for large n > f the measure G7 ra by zn n (a, b) = (1 + ^)aj(a, 6), for a, b G Af. Then, it is not hard 
to see that 

lim w n (a,b) = w(a,b), for all a,b £ X. 

n— >oo 

i.e. for every e > there exists n(e) G N such that for all n > n{e) implies \w n (a, b) — w(a, b)\ < s/2. 
Now, observe we have 

lim lim w kn (a,b) > lim liminf m(b,c)i> kn (a,c) > lim m(b, c)O n (a, c) = w(a,b), 

n^oo k— >oo ' rt— s-oo fc— »oo ^ — ' ' n— too ^ — ' 

cex* cex* 
for all a, b G X, where we have used the Fatou's Lemma in the last inequality. 
Hence, there exists (k(e),n(e)) G N x N such that n > n(e), k > k{e) implies 

&k,n( a i b) > w(a, b) — §, for all a,b G X. 
So, for large k > k(e) and large n > n(e) we have that 

\w n (a,b) - w kjfl (a,b)\ < \w n (a,b) - w(a,b)\ + \w(a,b) - w k:fl (a,b)\ <§ + §=£, 
by the triangle inequality. This gives 

lim lim vj kn (a, b) = lim w n (a,b) = w(a,b), for all a,b G X, 

rt-s-oo k— >oo ' n-s-oo 

which concludes the proof of the Lemma. □ 

Lemma 3.9 (Approximation Step 3 ). Suppose (m,u) is sub- consistent. Then, for every e > 0, 
there exists k(e) such that for large k > k(e), \w(a, b) — vo k (a, b)\ < e, for all a,b G X, d(v k , u) < e. 

Proof. We begin the proof, by noting that 

lim w k n (a, b) = vo k {a, 6), for all a,b £ X 

n— >oo ' 

and limn^oo v k ^ n = v k by construction. Fix S = |, then there exists (k(e), n(e)) G N x N such that 
n > n(e), k > k{e) implies 

\w k (a,b) - w k , n (a,b)\ < ^, 
|ro fcjn (a,6) - w(a,b)\ < ^, 

d(v k , v ktn ) < 5, 

d{v k , n ,v n ) < 5, 

and 

d(0 n , v) < 5. 
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Using the above inequalities we have 

\w k (a,b) - zu(a,b)\ < \w k (a,b) - w k>n (a,b)\ + \w k>n (a,b) - zu(a,b)\ -y + y = | + | = e > 

Finally, we have that d(y k , v) < d(i/ k , i'k.n) + d{£>k,n-, ^n) + d(i) n , v) < 38 = 3(|) = e. □ 

Next we state a lemma analogous to Lemma 3.6 by Dembo etal [DMS03J. It will help us approximate 
pair of measures (zu, v) G M. c ,k with W2 ^ i>\ by a consistent {w X)V , v x ,y) with (w x , y )2 = (v x ,y)i, where 
(w X) y) 2 and {v x , y )i are the second and the X- marginals of the measures tn x ^y and v X y, respectively. 
To do this, we review or collect some notation from [DMS03]. For v G M c ,k and a G X we write 
u(-, \a) = v{-,a)/v\(a) and 

Aofi(a,b) = m(a, c)v(c\b), iora,b^X. 

As Xj* is finite we can find bo G X r such that Q k {c \ a} > and also v{c\b) > 0, such that J2 a ex r m { a i c 2) 
is large enough to ensure that the difference Ylaex r u o,o [m(a, c-i) — m(a, ci(o))] > 0. Let ci(6) be any 
number for b £ Xt and C2 = ci(6) for all b ^ bo. For any |x| < 1/2 we recall the definition of the 
probability measure v x $ as 

Vx,o{c\b) = v(c\b) + xu(c 2 \b)u(ci\b)(l{ c=C2 y - 1{ C=C1 }). 

Let yo = Q{c2|6o} m i n feGA' r Q{ci[b} > 0, and for any < y < yo we recall the definition of the 
probability measures v x , y (-\b) by 

Vx,o{c\b) = mm(z/ a . j o(ci|6),Q{c|fe}/y) for c^c x 
Vx,y{ci\b) = v xfl 

where + indicates the positive part. Note that by construction 

A XtV (a,b) = m{a 1 c)v x ^ y {c\b) -)• A 0) o(a,o), for any a, b G X. 
cex* 

Finally, by g{A x .y) and {v X) y)i we denote Perron-Frobenius eigen value and eigen vector of the the 
weakly irreducible matrix A XtV , respectively. 

Lemma 3.10 (Approximation Step 4 ). Suppose (zu,u) G M.{X x X) x A4(X x X£) is consistent 
with W2 ^ v\. Let (a, c) G X x then, for any y G (0, yo) and x(y) G (—1/2, 1/2), we have 

(i) v x>y {a,c) <Qk{c\a}(v x , y ) 1 (a)/y, 

(ii) v x , y {a, c) — > ^(a, c) as x — >■ and y \, 0, 

(iii) v x ^ y (a, c) = if and only if v{a, c) = 0, 

(iv) w Xi y(a, b) — > w(a, b) as x — > and y X 0, /or a// 6 G X. 

(v) (ro^yJaCa) = (^)i(a) 

Moreover, 

lim sup (^, y )i {a)H{v x ,y(- \ a) \\ Q k {- \\ a}) < vi(a)H{v{- \ a) \\ Q k {- \\ a}) 
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The proof of this Lemma which is based on the Perron-Probenius eigen value theorem and the im- 
plicit function theorem applied to the function f(x,y) = g(A Xy y), is omitted. See, proof of [DMS03, 
Lemma 3.6]. 

For Qk we recall the definition of the rate function Jk : M(X x X) x M(X x X k ) — > [0, oo] from 
Theorem 12.31 as 

H{y || v\ (g> Qk) if v) is consistent and wi = ui, 



j i s _ f || v\ <g> Q fc ) if (tu, i/) is 
' ' 1 oo otherwise. 



Lemma 13.111 below is a key ingredient in our proof of the lower bound in Theorem 12.11 and will be 
proved using the above four approximation Lemmas. 



Lemma 3.11 (Approximation Step 5 ). Suppose (zu, u) is sub- consistent, W2 = v\ and v <C v\( 

Then, for every e > 0, there exists (w,v) G M C) k such that \w(a,b) — w(a,b)\ < e, for all a,b G X, 
d{v, i>) < e, W2 = v\ and 

Jk(tU, 0) — J(w, u) < £. 

Proof. Recall our assumption that (w, v) is sub-consistent, W2 = V\ and v <C v\ ® Q. 

Case 1: If = (vk)i we take v := Vk and w := m k . Then, u <C {y)\ <8> Qk, and for every e > 0, 

\w(a, b) —w{a, b)\ < e, for all a,b £ X, d(y, u) < e and ti72 = v\ by Lemma [3.91 Now, using Lemma [3.6l 
we obtain 

limsup Jk(w k ,vk) = limsup v k (a, c) log (^j^^y < t/( a , c) log c) = J (w, u). 

k ^°° k ^°° (a,c)£XxX* (a,c)eXxX* 

Case 2: If (w k ) 2 ^ (ffc)i we use Lemma EE] to choose (ro^^^, v k)X) y) with (^,3.^)2 = (^fc , a ,y)i. We 
take v := Vk,x,y and w := vj kX y Then, v <C {y)\ <8> Qfc, Using Lemma 13.61 and Lemma 13.101 we obtain 

limsup limsup Jk{^k,x,y^k,x,y) = lim limsup V] Vk,x, y (a, c) log , ^N^ofL ri 

j/J.0 k^o (a,c)gAfxA'* 

< lim sup v k (a, c) log fij^jg^ 

(a,c)e*x.** 

where Vk,x,y( a ,b) — > Vk{a,b) and vJk tX ,y( a ,b) ~ * ttk(a,b) as 1 -> and y J, 0, for all a,b £ X. This 
completes the proof of the lemma. 

□ 

We recall that C(v) = (N(v), X\(v), . . . , Xjyf v \) and note that, for every k such that minQ{A^|a} > 

aex 

and any tree- indexed process x, we have that 

F{X = x I |T| = n} > ¥{(X = x, C{v) G X£, v G V \ \T\ = n) 

= H Q{X k *\x(v)}xF k {X = x\\T\=n} 



v£V, \T\=n 

> (minQ{^|a}) n x F k {X = x \ \T\ = n}, 



(3.27) 



where ¥ k denote the law of tree-indexed process with initial distribution /i and offpring kernel , and 
lim minQ{A^|a} = lim min > Q{c|a} = lim > lr cg x -i minQ{c|a} = 1, 

fc->oo a£rf k^too a£X ^-^ k— ¥00 a£X 



c&x k cex* 
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since X is a finite Alphabet. To complete the proof of the lower bound , we take O C M. s . Then, for 
any (w, v) G O sub-consistent with w 2 = vi, v -C v\ <8> Q we may find e > with ball around (zu, v) of 
radius 2e contained in O. By our approximation Lemma 13.114 we may find (m k ,v k ) G Of)A4 c ,k with 
\zu k (a,b) - tu(a,b)\ I 0, d(u k ,u) | 0, (m k ) 2 = (ffc)i, v k < ® Q fc and 

Hence, using the lower bound of Theorem 12.31 for offspring kernel given by (13.20p . (I3.27P for large 
k > k{e) (with minQ{A?£|a} > 0) and for large n > n(e), we obtain 

P{(L X , M x ) G O | \T\ = n) > P{ \w k (a,b) - L x (a,b)\ < e, Va, b G X,d(u k ,M x ) < e \ \T\ = n) 

> e nak x F k { \zu k {a,b) - L x (a,b)\ < e, Va, b G Af, 

d{u k ,M x ) <e\\T\=n} 

> exp ( - n(J k (w k , v k ) + e- a k )) 

where a& = log(min ag ^- Q{Affe|o}). Taking limits we have that 
1 

liminf — logP{(Lx, M x ) G O \ \T\ = n\ > — lim J k (tu k , v k ) — lim a k — e > —J(zu, v) — 2e. 
n—toc n fe— i>oo fc— >oo 

Taking e i we have have the desired result which completes the proof of the lower bound. 



4. Proof of Corollaries I2.21 l2.5l and Theorem 12.41 

4.1 Proof of Corollary 12.21 We derive this corollary from Theorem 12. II by applying the contraction 
principle to the linear mapping W : M(X x X) x M(X x X*) H> M(X x X*) defined by 

W(zu, v)(a, c) = v(a, c), for all (a, c) G X x X* . 

Infact Theorem 12.11 implies the large deviation principle for W(L X , M x ) with convex, good rate 
function J{y) = inf { J(zu, v) : W(zu, u) = v\. Now, using sub-consistency and vo 2 = v\ we obtain the 
form J(is) = H(y || ^i<8>Q), for v satisfying (m(-, c), v(a, c)) < v\. Recall the definition of shift-invariant 
and denote by Ad\ set of shift-invariant measures in Ai{X x X*). Write 

Mi = iv : v G M(X x X*), m(-,c), v{a,c)) < vA 

and note that Ai\ C M 2 . Also, for all (values of ) n where P{|T| = n} > 0, we have 

F{M X G M 2 \\T\ = n} = 1. 

Moreover, if v n G M 2 converges to v then 

v\ (a) = hm (f n )x (a) > liminf > m(a,c)v n (b,c) > > m(a, c)v(b, c), 

(6,c)6A'xA'« (6,c)eA'xA'* 

which implies z/ is sub-consistent. This means A4 2 is a closed subset of A4(X x Af*). Therefore, by 
[DZ981 Lemma 4.1.5], the LDP for M x conditional on the event {\T\ = n} holds with convex, good 
rate function K, which completes the proof of the corollary. 
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4.2 Proof of Theorem 12.41 We begin the proof of the theorem by stating the following Lemma. 

Lemma 4.1. Suppose that q(c) = p{n) Y\!i=\ Q( a i) f or all c = (n,ai,..., a n ), where q(-) is a probability 
vector on X and p(-) a probability measure with mean one on the nonnegative integers. Then, we 
have 

inf [h(v \\q) : v G M{X*), <j){b) = E m(b, c) u{c) for all b G <y| = zH(<f>/z \\ q) + I p (z), (4.1) 

cex* 

where <j) : X — >• R and z = <p(b). 

Proof. For v G M(X*), we let (p(b) = ^2 c€ x* Tn(b,c)V(c), for all b G X and suppose first that z = 0, 
i.e. (j)(b) = for all b G X. Then, ?((0, 0)) = 1 is the only possible measure in left side of (|4.ip . leading 
to 1(0, q) = -logg((O,0)) = -logp(0). It follows from ^ that I p (0) = -logp(0) establishing (|4TTD 
for such </>(•). We assume hereafter that z > 0. Now the possible measures V{-) in the left side of 
(|4,ip are of the form V(c) = s(n)v n (ai, . . . ,a n ) for c = (n, ai, . . . ,a n ), with vo = 1, where s(-) is a 
probability measure on the nonnegative integers whose mean is z, and v n ( ■ ), n > 1, are probability 
measures on X n with marginals v n ^( ■ ) such that 

oo n 

0(6) = s(n) E v n ,i(b) for all b G A" . (4.2) 

n=l i=l 

By the assumed structure of q( ■ ) we have for such V( ■ ) that 

oo 

Hp ll<?) = E s ( n ) H ( v n II 5") + H(s || p) , (4.3) 

n=l 

where §™ denotes the product measure on X n with equal marginals q. Recall that 



n 



E s(n)H{ Vn || r) > E s («) E II 5-) > zH (V 1 e »w E 

n=l n=l i=l ra=l i=l 

with equality whenever v n = fJILi ^n.i and fn,i are independent of n and i (see [DZ98j Lemma 7.3.25] 
for the first inequality, with the second inequality following by convexity of H{- \ \ q) and the fact that 
s{n)n = z). So, in view of (I4.2p . 

H(u || q) > zH(<P/z || q) + H(s \\ p) , (4.4) 

with equality when 

v n = (z _1 0) n for all n > 1. Now, write A P (A) := log e An p(n) and notice that A convex function and 

n 

A(0) = < oo, and so, we have, for every A G R, A p (A) > — oo. Using Jensen's inequality, for every 
s G M(N U {0}) and every A G R, we have 

A P (A) = log E s(n) (^) > E "(n) log (^gf> ) = A E ns ( n ) ~ H(s || p), 

n n n 

with equality if s\(n) = p(n)e Xn ^^^ . Thus, for all A and all z, we have 

Xz - A(A) < inf {H(s || p) : s G M(N U {0}) and J2n s ( n ) n = z ) A *( 2 )> ( 4 - 5 ) 
with equality when ^ n s{n)n = z. Elementary calculus also shows that 

A*(z) = Kz- A P (A*), (4.6) 
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where A* is the solution of Ap(A*) = z and := A' (A). Combining (j4.6j) and (|4.5p we obtain 
sup{Az- A P (A)} < A*(z) < sup {Xz - A(A)} < sup {Xz - A p (A)}. 

\eR,A' p (X)=z 



This yields A*(z) = I p (z), which ends the proof of the Lemma. □ 
Next, note that X is an irreducible, critical multitype Galton- Watson tree with offspring law 

n 

Q{c | 6} = pin) Y\ Q{a-i | b}, for c = (n, a x , . . . , a n ). (4.7) 



i=l 



We derive Theorem 12.41 from Theorems 12.11 and 12.31 by applying the contraction principle to the 
continuous linear mapping F : M(X x X) x M(X x X*) ^ R XxX , defined by 

F(w, u)(a, b) = w(a, b), for all (w, v) G M(X x X) x M(X x X*) and a, b £ X. (4.8) 

It is easy to see that on {|T| = n} we have Lx = ^iF(Lx, Mx) = t^tLx- It follows that conditioned 
on {\T\ = n} the random variables Lx are exponentially equivalent to Lx, hence Lx satisfy the same 
large deviation principle as F{Lx , Mx), see [DZ981 Theorem 4.2.13] Without loss of generality we 
restrict the space for the large deviation principle of Lx to the set of all probability vectors on X x X, 
see [DZ981 Lemma 4.1.5(b)]. 

Suppose p has finite second moment. Then, Theorem 12.11 implies the large deviation principle for 
F(Lx, Mx) conditioned on {|T| = n} with the good rate function I(fi) = inf{J(/u, v) : F{fx,v) = fx}, 
see for example [DZ981 Theorem 4.2.1]. Convexity of I follows easily from the linearity of F and 
convexity of J. 

Turning to the proof of (|2.10j) . recall that v is sub-consistent if and only if F(zu,u)(a,b) > 
^2cex* m{b,c)v{a,c) for all a,b £ X. Hence, we have that 

u) = inf {H {v \\v x ® Q) : F(ji, v)(; ■) > ^ m(; c)u(; c), v x = fi 2 ) . (4.9) 

cex* 

Note that ui(a) = yields T,b F ( u )( a ' b ) = if F {^i v )(-, •) = ^ m(-,c)u(-,c) and 

ceX* 

^2 m(b,c)u(a,c) < if F(fi,v)(-, •) > ^ m(-,c)u(-,c). 

(b,c)ex* cex* 



Hence if fii(a) > = ^2(0,) for some a € X then jf : F(fi, v)(-, •) = m(-, c)v{-,c), V\ = /i2 j U : 

F(tu, z/)(-, •) > ^2 m ('> c )K*i c )> y l = /^j is an empty set, and therefore I(/x) = 00. Assuming, 

cex* 

throughout the rest of the proof that [i\ <C fi2, it is not uneasy to verify that 

/go = • 1 a} ) ' (4 - 10) 

aex ^ ' 

where for q G M(X*), ^(^^y> ff) = inf {%>> ?) : : * ^ M +> <K a ) ^ TT^J' for a11 « G ^} and 

1(0, g) := inf Ih(u \\ q) : v € M{X*), 0(6) = ^ m(b, c) u(c) for all b £ Af| . (4.11) 
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Suppose now that q(c) = p(n) YYl = i q(di) for all c = (n, a\, ■ ■ ■ , a n ), where q(-) is a probability vector on 
X and p( ■ ) a probability measure with mean one on the nonnegative integers, whose second moment 
is finite. Then, by Lemma 14.14 we have the representation 

7(0,g) = U\\H{<P/\\<i>\\ II q)+I P (U\\) , (4-12) 
where \\4>\\ := Ylbex 0(^)- Therefore, it suffice for us to show that 

inf {/(*,$) : 4> : X -> R+, <j>(b) < §f , for all a e x) = ggi?(g^ || g) + (4.13) 

To do this, we write 

h{4>{b)) ■■= M 0) + u(b){<P(b) - for 6 S AT, 

where a is a Lagrange multiplier. Then, elementary calculus shows that a{b) is the solution of the 
equation 

IJOM) -t^y e - a ^q(a) = 

112(a)' 112(a) Z-^ Hy * 

a^X 

and that <fi(b) = ^(a) 1S ^ ne rninimizer of our constraint optimization problem. Writing <f>(b) = 

in 14.121 we obtain left side of (|4.13|) which proves the theorem in case of p with unbounded support 

and finite second moments. 

4.3 Proof of Corollary 12.51 Recall that T is Galton- Watson tree with offspring law p(£) = 2~^ +1 \ 
£ = 0, . . . , . Also, we recall that X is markov chain indexed by T with arbitrary initial distribution 
and transition kernel Q. Then, X satisfies all assumptions of Theorem 12. 4} in particular we have 
Y^Lq^ 2 p{(-) = 3 < 00. Therefore, by Theorem 12.41 conditioned on the events {|T| = n} satisfies a 
large deviation principle in Ai(X x X) with good, convex rate function 

/M _/^li«»<»+E»w*(^) (4 . 14) 

I, 00 otherwise, 
where I p (x) = sup AgK {Ax + log (2 — e A )}. Elementary Calculus shows that 

sup{Ax + log(2-e A )} = xlogx - (x + l)log^±^. (4.15) 

AgR 

Therefore, writing (|4. 15j) in (|4.14|) and rearranging terms we obtain the form of the rate function in 
the corollary which completes the proof. 

Acknowledgments: I thank P. Morters for helpful discussions on how to obtain large deviation lower 
bounds by conditioning, and the two referees (for my first submission) for many suggestions which 
have helped to improve the article. 
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